The code and information contained herein constitutes the complete write-up of the experiments I carried out for my first Qualifying Paper towards the PhD in Linguistics at Stanford University. The goal is to make this document both a dumping ground for my ideas while it is in progress, as well as, eventually, a publicly-available version of my Qualifying Paper, in the spirit of Open Science.
For this write-up and analysis, I require the following packages, loaded in here:
library(ggplot2)
library(tidyverse)
library(lme4)
library(stringr)
library(languageR)
library(lmerTest)
library(reshape2)
library(grid)
source("helpers.R")
I also use a custom color palette, so I include the code for that here as well1.
bran_palette = c("#7ae7e5", "#fe5f55", "#B2A6DE", "#14342b", "#69385c")
theme_set(theme_minimal())
We also need the frequency data!
frequency <- read.csv("freq_vals.csv")
lib_cols <- c('ABC','CNN','PBS','NBC','MSNBC','NPR','CBS')
frequency <- frequency %>%
mutate(total_left = rowSums(frequency[lib_cols])) %>%
mutate(total_right = FOX) %>%
mutate(left_wpm = (total_left/109300000) * 1000000) %>%
mutate(right_wpm = (total_right/12200000) * 1000000) %>%
mutate(neutral_binary = ifelse(gender=="neutral",1,0)) %>%
mutate(morph_type = ifelse(lexeme!= 'actor' & lexeme!= 'host' & lexeme !='hunter' & lexeme!= 'villain' & lexeme!= 'heir' & lexeme!= 'hero','compound','adoption'))
frequency_grouped <- frequency %>%
filter(morph_type=="compound") %>%
group_by(lexeme,neutral_binary) %>%
summarise(mean_freq_left = mean(left_wpm), mean_freq_right = mean(right_wpm))
`summarise()` has grouped output by 'lexeme'. You can override using the `.groups` argument.
write.csv(frequency_grouped, "freq_prep.csv")
lex_freqs <- read.csv("freq_preped.csv") %>%
mutate(left_surprisal = (-log(mean_left_neutral))/(-log(mean_left_gendered))) %>%
mutate(right_surprisal = (-log(mean_right_neutral))/(-log(mean_right_gendered)))
Participants
100 participants were recruited through the online recruitment platform Prolific. All participants were self-identified L1 English speakers and were born and resided in the United States at the time of participation. None of the participants had participated in the 4-person pilot of the norming study, and the pilot data is not reported here.
All participants, regardless of their data’s final inclusion in the analysis, were compensated $2.00 for their partaking in the study. The average completion time for the task was 4.353 minutes, which resulted in an average payout of $31.85/hr.
Experiment
A link to the experiment can be found here, and is available for the keen reader to click through; results are not stored in any way.
Reading in the Data
I read in the data here, as well as including some upfront filters and mutations. Their individual purposes are provided in accompanying comments.
norming_data <- read.csv("norming_data.csv") %>%
filter(id!="example1") %>% # Will filter out non-critical trials, i.e. the example trial from the beginning of the experiment
mutate(equalized_response = ifelse(scale=="FM",8-response,response)) %>% # This will render all data points on the same scale, as participants randomly received either "very likely a man" or "very likely a woman" as the left end of their response scale, with the other appearing at the right end
mutate(orthog = ifelse(orthog=="sroceress","sorceress",orthog)) %>% # Fixes a typo
mutate(id = ifelse(id=="Stunt_double","Stunt double",id)) %>% # This, as well as all lines below it, convert compounds formed by spaces from their underscore forms to their spaced forms (e.g. police_officer -> Police officer)
mutate(id = ifelse(id=="Restaurant_server","Restaurant server",id)) %>%
mutate(id = ifelse(id=="Police_officer","Police officer",id)) %>%
mutate(id = ifelse(id=="Door_attendant","Door attendant",id)) %>%
mutate(id = ifelse(id=="Flight_attendant","Flight attendant",id)) %>%
mutate(id = ifelse(id=="Garbage_Collector","Garbage collector",id)) %>%
mutate(id = ifelse(id=="Mail_Carrier","Mail carrier",id)) %>%
mutate(id = ifelse(id=="Maintenance_Person","Maintenance person",id)) %>%
mutate(id = ifelse(id=="Paper_carrier","Paper carrier",id))
Generating an Exclusion List
Here I generate a list of participants to exclude, based on the criteria laid out in Section 2.1.1.
norming_exclusion <- norming_data %>%
filter(gender=="female") %>%
group_by(workerid) %>%
summarize(female_mean = mean(equalized_response)) %>%
unique() %>%
mutate(exclusion = female_mean < mean(female_mean) - 2*sd(female_mean)) %>%
filter(exclusion==TRUE)
Now I can exclude these participants from the data under analysis.
norming_data <- norming_data[!(norming_data$workerid %in% norming_exclusion$workerid),]
Means by Item
norming_means <- norming_data %>%
group_by(id,gender,orthog) %>%
summarise(indi_mean = mean(equalized_response), trial_count=n())
`summarise()` has grouped output by 'id', 'gender'. You can override using the `.groups` argument.
text_high <- textGrob("Female", gp=gpar(fontsize=10, fontface="bold"))
text_low <- textGrob("Male", gp=gpar(fontsize=10, fontface="bold"))
ggplot(norming_means, aes(x=id, y=indi_mean, color=gender)) +
geom_point() +
theme_minimal() +
theme(axis.text.x = element_text(angle=45, hjust=1, size=8)) +
labs(x="Ungendered Lexical Item", y="Mean Rating", color = "Gender of Form Seen", title="Mean Gender Rating by Ungendered Form and Gender Seen") +
scale_color_manual(values = bran_palette) +
theme(plot.margin = unit(c(1,1,2,1), "lines")) +
annotation_custom(text_high,xmin=-1.3,xmax=-1.3,ymin=7,ymax=7) +
annotation_custom(text_low,xmin=-1,xmax=-1,ymin=1,ymax=1) +
coord_cartesian(clip = "off")
ggsave("norming_results_all.png", width=7,height=4,path='/Users/branpap/Desktop/gender_processing/talks_and_papers/qp_paper/figures')
Participants
We originally ran the experiment on 200 participants, recruited through the online participant recruitment platform Prolific. The mean time of the experiment was 5.39 minutes, and participants were paid $1.75 for their participation2. The only restrictions placed on participants were that they:
These requirements were implemented in order to assure that speakers came from at least somewhat similar linguistic backgrounds, as certain lexical items in the study (such as congressperson) are quite localized to the United States.
After this initial run of the experiment, we found that there was a dearth of conservative or Republican-aligned participants. As a result, we ran the experiment again, this time on 98 self-identified Republicans. This was achieved by adding a filter on Prolific so that only Republican-identified individuals could see the task. The rest of the experiment, including payment, was exactly the same, except that an additional disclaimer that participants could not use the FireFox browser experiment, after the first run revealed an incompatibility between JavaScript and FireFox. The two runs of the experiment amounted in a total of 298 participants who completed the task.
Reading in the Data
sprt_data <- read.csv('sprt_data.csv') %>%
filter(trial_id!= 'example') %>%
filter(region=='critical')
Exclusions
Now, we want to exclude any participants who failed to answer at least 85% of the attention check questions correctly. We do this by creating a list of all participants who scored less than 85% on these checks, and then cross-referencing this list with all data points, removing any data points whose participants were in the exclusion list.
sprt_exclusion <- sprt_data %>% group_by(workerid) %>%
summarise(accuracy = mean(response_correct)) %>%
mutate(exclude = ifelse(accuracy < 0.85,'Yes','No')) %>%
filter(exclude == 'Yes')
sprt_data <- sprt_data[!(sprt_data$workerid %in% sprt_exclusion$workerid),]
We all want to filter out all trials in which the reading time for the critical item was more than 2.5 standard deviations from the mean reading time on that lexical item across all participants.
sprt_data <- sprt_data %>%
group_by(trial_id) %>%
mutate(id_mean = mean(log(rt))) %>%
mutate(exclusion = (log(rt) < mean(log(rt)) - 2*sd(log(rt))|(log(rt) > mean(log(rt)) + 2*sd(log(rt))))) %>%
ungroup() %>%
filter(exclusion==FALSE)
This results in 238 trials being removed from the 5580 we got after the by-participant exclusions. We now have 5342 trials we can use for analysis.
Additional Information Now that we have only the rows we want, let’s add some new columns, which will contain important information for each data point. Here, we will be adding:
Ideally, I would’ve added all of these but the first when I actually created the stimuli and logged responses, but I forgot to! Luckily, R allows us to do this post-hoc fairly straightforwardly… which is good, since these features will be critical in our data visualization and analysis.
The question under investigation here is whether or not individuals’ conceptions of gender affect how they process gendered and gender-neutral forms of English personal and professional titles.
In order to examine this, we need to quanify participants’ ideological views! Here we have adopted the 13-item Social Roles Questionnaire put forth in Baber & Tucker (2006). Questions 1-5 correlate to the ‘Gender Transcendent’ subscale, and questions 6-13 correspond to the ‘Gender Linked’ subscale. Each item is scored on a scale of 0-100. So, the first thing we want to do is make two lists of columns which correspond to these two subscales, since the questions are stored individually in the data:
gender_transcendence_cols <- c('subject_information.gender_q1','subject_information.gender_q2','subject_information.gender_q3','subject_information.gender_q4','subject_information.gender_q5')
gender_linked_cols <- c('subject_information.gender_q6','subject_information.gender_q7','subject_information.gender_q8','subject_information.gender_q9','subject_information.gender_q10','subject_information.gender_q11','subject_information.gender_q12','subject_information.gender_q13')
Now we can use the mutate() method on sprt_data to add two new columns, one for each subscale. We tell R to take the means of the specified columns in [column_names] of sprt_data for each individual row: rowMeans(sprt_data[column_names]). We also have to subtract this mean from 100 in the case of the ‘Gender Transcendent’ subscale, since it is inversely scored. Finally, we can create an average total score regardless of subscores, simply by meaning the two subscores we already have.
sprt_data <- sprt_data %>%
mutate(gender_trans = 100 - (rowMeans(sprt_data[gender_transcendence_cols]))) %>%
mutate(gender_link = rowMeans(sprt_data[gender_linked_cols]))
gender_all = c('gender_trans','gender_link')
sprt_data <- sprt_data %>%
mutate(gender_total = rowMeans(sprt_data[gender_all]))
We also want to add whether the trial included a female or male referent (but also, like, destroy the binary!). In order to do this, we’ll just add a trial_gender column that says ‘female’ if the condition was either ‘neutral_female’ or ‘congruent_female’. Otherwise, we want the trial_gender to say ‘male’.
sprt_data <- sprt_data %>%
mutate(trial_gender = ifelse(condition=='neutral_female' | condition == 'congruent_female','female','male'))
sprt_data %>%
select(workerid,rt,condition,trial_id,trial_gender)
Now we want to add whether or not the lexeme’s neutral form is developed by compounding (as in ‘congress-person’) or by the adoption of the male form (as in ‘actor’ being used more for both men and women). In this study, we only have six lexemes of the latter type, so we’ll just tell R to assign those a morph_type value of ‘adoption’ (for ‘male adoption’), and all else will be assigned a value of ‘compound’.
sprt_data <- sprt_data%>%
mutate(morph_type = ifelse(lexeme!= 'actor' & lexeme!= 'host' & lexeme !='hunter' & lexeme!= 'villain' & lexeme!= 'heir' & lexeme!= 'hero','compound','adoption'))
sprt_data %>%
select(rt,lexeme,morph_type)
Another important factor we want to explore is the length of the critical item! In order to add this, we simply create a new column form_length and tell R to input as that column’s value the length of the string that appears in that row’s form column, which corresponds to the orthograpic form of the critical item in that trial. Note that this will include spaces in the count!
sprt_data <- sprt_data %>%
mutate(form_length = str_length(form))
sprt_residual_model <- lm(log(rt)~form_length, data = sprt_data)
sprt_data <- sprt_data %>%
mutate(resid_rt = resid(sprt_residual_model))
Now that we have these, we can run a simple linear regression which will show us the effect of orthographic length on reading time. Then we add a new column in the data which is the residual reading time, or the reading time in log space AFTER we control for the effects of orthographic length.
We also want to make sure we have a column which records whether or not the trial was gender-congruent (as in ‘Shelby is a congresswoman’) or gender neutral (as in ‘Shelby is a congressperson’). We add a trial_congruency column, which is valued as ‘congruent’ if that row’s condition is one of the two congruent conditions. Otherwise, it gets valued as ‘neutral’.
sprt_data <- sprt_data %>%
mutate(trial_congruency = ifelse(condition=='congruent_male' | condition == 'congruent_female','congruent','neutral'))
Finally, we can classify participants by their particular political alignment; we can construe this broadly as “Republicans” vs. “Democrats”, with those who declined to state a preference, or placed themselves in the middle, as “Non-Partisan”.
sprt_data <- sprt_data %>%
mutate(poli_party = ifelse(subject_information.party_alignment == 1 | subject_information.party_alignment == 2,'Republican',ifelse(subject_information.party_alignment == 4 | subject_information.party_alignment == 5,'Democrat','Non-Partisan')))
Speaker Means
sprt_speaker_means <- sprt_data %>%
group_by(condition,poli_party,workerid) %>%
summarize(MeanRT=mean(resid_rt))
`summarise()` has grouped output by 'condition', 'poli_party'. You can override using the `.groups` argument.
sprt_data %>%
group_by(condition,trial_gender) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender)) +
geom_point(size=3) +
geom_jitter(data = sprt_speaker_means, aes(y=MeanRT),alpha=.1,color='black') +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
scale_color_manual(values = bran_palette, )
`summarise()` has grouped output by 'condition'. You can override using the `.groups` argument.
sprt_whole_means <- sprt_data %>%
group_by(trial_gender,trial_congruency) %>%
summarize(MeanRT = mean(rt), CI.Low = ci.low(rt), CI.High = ci.high(rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High)
`summarise()` has grouped output by 'trial_gender'. You can override using the `.groups` argument.
dodge = position_dodge(.9)
ggplot(data=sprt_whole_means, aes(x=trial_gender,y=MeanRT,fill=trial_congruency)) +
geom_bar(stat='identity',position=dodge) +
geom_errorbar(aes(ymin=YMin,ymax=YMax),width=.25,position=dodge) +
scale_fill_manual(values = bran_palette)
Reading Time by Gender Ideology
sprt_speaker_means_ideology <- sprt_data %>%
group_by(gender_total,workerid,trial_gender,trial_congruency,poli_party) %>%
summarise(meanrt = mean(resid_rt))
`summarise()` has grouped output by 'gender_total', 'workerid', 'trial_gender', 'trial_congruency'. You can override using the `.groups` argument.
sprt_speaker_means_ideology %>%
filter(!is.na(poli_party)) %>%
ggplot(aes(x=gender_total,y=meanrt,color=trial_gender,linetype=trial_congruency)) +
geom_point() +
geom_smooth(method='lm') +
scale_color_manual(values = bran_palette, ) +
facet_wrap(~poli_party)
`geom_smooth()` using formula 'y ~ x'
Reading Time on Neologisms
sprt_data %>%
filter(!is.na(poli_party)) %>%
filter(trial_congruency == "neutral") %>%
group_by(gender_total,workerid,trial_gender,poli_party) %>%
summarise(meanrt = mean(resid_rt)) %>%
ggplot(aes(x=gender_total,y=meanrt,color=trial_gender)) +
geom_point() +
geom_smooth(method='lm') +
scale_color_manual(values = bran_palette, ) +
facet_wrap(~poli_party)
`summarise()` has grouped output by 'gender_total', 'workerid', 'trial_gender'. You can override using the `.groups` argument.
`geom_smooth()` using formula 'y ~ x'
Reading Time by Item
sprt_data %>%
group_by(condition,trial_gender,trial_congruency,lexeme) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender,shape=trial_congruency)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~ lexeme) +
theme(axis.text.x = element_text(angle = 45, vjust = .7, hjust=.7)) +
scale_color_manual(values = bran_palette)
`summarise()` has grouped output by 'condition', 'trial_gender', 'trial_congruency'. You can override using the `.groups` argument.
Whole-Party Means
sprt_data %>%
filter(!is.na(poli_party)) %>%
group_by(poli_party,condition,trial_gender,trial_congruency) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender,shape=trial_congruency)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~ poli_party, nrow = 1) +
theme(axis.text.x = element_text(angle = 45, vjust = .7, hjust=.7)) +
scale_color_manual(values = bran_palette)
`summarise()` has grouped output by 'poli_party', 'condition', 'trial_gender'. You can override using the `.groups` argument.
Modelling
sprt_final_dat <- merge(sprt_data,lex_freqs,by="lexeme") %>%
mutate(cgender_total = scale(gender_total)) %>%
mutate(cage = scale(subject_information.age)) %>%
mutate(cmean_left_neutral = scale(mean_left_neutral)) %>%
mutate(mean_all = (mean_left_neutral + mean_right_neutral)/2) %>%
mutate(cmean_all = scale(mean_all)) %>%
mutate(ctrial_congruency = as.numeric(as.factor(trial_congruency))-mean(as.numeric(as.factor(trial_congruency)))) %>%
mutate(ctrial_gender = as.numeric(as.factor(trial_gender))-mean(as.numeric(as.factor(trial_gender))))
complex_model <- lmer(resid_rt~ctrial_congruency*cgender_total*poli_party*cmean_all + (1|workerid) + (1|lexeme) + (1|name),data = sprt_final_dat)
summary(complex_model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: resid_rt ~ ctrial_congruency * cgender_total * poli_party * cmean_all + (1 | workerid) + (1 | lexeme) + (1 | name)
Data: sprt_final_dat
REML criterion at convergence: 2983.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.9865 -0.5941 -0.0460 0.5125 4.4044
Random effects:
Groups Name Variance Std.Dev.
workerid (Intercept) 0.1531582 0.39135
name (Intercept) 0.0007211 0.02685
lexeme (Intercept) 0.0019161 0.04377
Residual 0.0998232 0.31595
Number of obs: 3713, groups: workerid, 275; name, 24; lexeme, 14
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -8.711e-02 4.823e-02 2.726e+02 -1.806 0.07196 .
ctrial_congruency 8.878e-03 2.036e-02 3.411e+03 0.436 0.66275
cgender_total -6.397e-02 5.012e-02 2.669e+02 -1.276 0.20296
poli_partyNon-Partisan 1.270e-01 7.020e-02 2.668e+02 1.809 0.07157 .
poli_partyRepublican 1.954e-01 6.489e-02 2.672e+02 3.011 0.00285 **
cmean_all -2.990e-03 1.542e-02 2.580e+01 -0.194 0.84779
ctrial_congruency:cgender_total 1.106e-02 2.204e-02 3.420e+03 0.502 0.61568
ctrial_congruency:poli_partyNon-Partisan -1.286e-02 3.069e-02 3.413e+03 -0.419 0.67520
ctrial_congruency:poli_partyRepublican 1.080e-02 2.853e-02 3.412e+03 0.379 0.70494
cgender_total:poli_partyNon-Partisan -1.629e-02 7.267e-02 2.669e+02 -0.224 0.82276
cgender_total:poli_partyRepublican -6.669e-02 6.541e-02 2.671e+02 -1.019 0.30890
ctrial_congruency:cmean_all 3.509e-03 2.092e-02 3.423e+03 0.168 0.86678
cgender_total:cmean_all -3.001e-03 1.088e-02 3.395e+03 -0.276 0.78264
poli_partyNon-Partisan:cmean_all 2.016e-02 1.529e-02 3.400e+03 1.319 0.18729
poli_partyRepublican:cmean_all 5.686e-03 1.417e-02 3.399e+03 0.401 0.68823
ctrial_congruency:cgender_total:poli_partyNon-Partisan -6.888e-02 3.187e-02 3.415e+03 -2.161 0.03073 *
ctrial_congruency:cgender_total:poli_partyRepublican -1.059e-02 2.872e-02 3.412e+03 -0.369 0.71229
ctrial_congruency:cgender_total:cmean_all 1.584e-02 2.243e-02 3.425e+03 0.706 0.48006
ctrial_congruency:poli_partyNon-Partisan:cmean_all 3.462e-03 3.157e-02 3.422e+03 0.110 0.91267
ctrial_congruency:poli_partyRepublican:cmean_all -1.306e-02 2.938e-02 3.423e+03 -0.445 0.65665
cgender_total:poli_partyNon-Partisan:cmean_all 3.021e-03 1.590e-02 3.399e+03 0.190 0.84932
cgender_total:poli_partyRepublican:cmean_all -2.510e-03 1.417e-02 3.400e+03 -0.177 0.85941
ctrial_congruency:cgender_total:poli_partyNon-Partisan:cmean_all -1.828e-02 3.282e-02 3.422e+03 -0.557 0.57753
ctrial_congruency:cgender_total:poli_partyRepublican:cmean_all -7.961e-04 2.921e-02 3.423e+03 -0.027 0.97826
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation matrix not shown by default, as p = 24 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Data Read-in
maze_data <- read.csv('maze_data.csv') %>%
filter(trial_id!= 'example') %>%
filter(region=='critical')
maze_data %>%
group_by(workerid) %>%
summarise(workerid=paste(unique(workerid))) %>%
nrow()
[1] 198
Running Exclusion Criteria Now, we want to exclude any participants who failed to answer at least 80% of the attention check questions correctly. We do this by creating a list of all participants who scored less than 80% on these checks, and then cross-referencing this list with all data points, removing any data points whose participants were in the exclusion list.
maze_exclusion <- maze_data %>% group_by(workerid) %>%
summarise(accuracy = mean(response_correct)) %>%
mutate(exclude = ifelse(accuracy < 0.80,'Yes','No')) %>%
filter(exclude == 'Yes')
maze_data <- maze_data[!(maze_data$workerid %in% maze_exclusion$workerid),] %>%
filter(rt !='null')
maze_data <- maze_data %>%
group_by(trial_id) %>%
mutate(id_mean = mean(log(rt))) %>%
mutate(exclusion = (log(rt) < mean(log(rt)) - 2*sd(log(rt))|(log(rt) > mean(log(rt)) + 2*sd(log(rt))))) %>%
ungroup() %>%
filter(exclusion==FALSE)
Additional Information
maze_data <- maze_data %>%
mutate(gender_trans = 100 - (rowMeans(maze_data[gender_transcendence_cols]))) %>%
mutate(gender_link = rowMeans(maze_data[gender_linked_cols]))
gender_all = c('gender_trans','gender_link')
maze_data <- maze_data %>%
mutate(gender_total = rowMeans(maze_data[gender_all]))
We also want to add whether the trial included a female or male referent (but also, like, destroy the binary!). In order to do this, we’ll just add a trial_gender column that says ‘female’ if the condition was either ‘neutral_female’ or ‘congruent_female’. Otherwise, we want the trial_gender to say ‘male’.
maze_data <- maze_data %>%
mutate(trial_gender = ifelse(condition=='neutral_female' | condition == 'congruent_female','female','male'))
maze_data %>%
select(workerid,rt,condition,trial_id,trial_gender)
Now we want to add whether or not the lexeme’s neutral form is developed by compounding (as in ‘congress-person’) or by the adoption of the male form (as in ‘actor’ being used more for both men and women). In this study, we only have six lexemes of the latter type, so we’ll just tell R to assign those a morph_type value of ‘adoption’ (for ‘male adoption’), and all else will be assigned a value of ‘compound’.
maze_data <- maze_data%>%
mutate(morph_type = ifelse(lexeme!= 'actor' & lexeme!= 'host' & lexeme !='hunter' & lexeme!= 'villain' & lexeme!= 'heir' & lexeme!= 'hero','compound','adoption'))
maze_data %>%
select(rt,lexeme,morph_type)
Another important factor we want to explore is the length of the critical item! In order to add this, we simply create a new column form_length and tell R to input as that column’s value the length of the string that appears in that row’s form column, which corresponds to the orthograpic form of the critical item in that trial. Note that this will include spaces in the count!
maze_data <- maze_data %>%
mutate(form_length = str_length(form))
simple_model <- lm(log(rt)~form_length, data = maze_data)
maze_data <- maze_data %>%
mutate(resid_rt = resid(simple_model))
summary(simple_model)
Call:
lm(formula = log(rt) ~ form_length, data = maze_data)
Residuals:
Min 1Q Median 3Q Max
-0.83975 -0.27384 -0.05032 0.23306 1.56914
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.92941 0.01911 362.644 < 2e-16 ***
form_length 0.01099 0.00196 5.608 2.21e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3638 on 3269 degrees of freedom
Multiple R-squared: 0.00953, Adjusted R-squared: 0.009227
F-statistic: 31.45 on 1 and 3269 DF, p-value: 2.212e-08
Now that we have these, we can run a simple linear regression which will show us the effect of orthographic length on reading time. Then we add a new column in the data which is the residual reading time, or the reading time in log space AFTER we control for the effects of orthographic length.
We also want to make sure we have a column which records whether or not the trial was gender-congruent (as in ‘Shelby is a congresswoman’) or gender neutral (as in ‘Shelby is a congressperson’). We add a trial_congruency column, which is valued as ‘congruent’ if that row’s condition is one of the two congruent conditions. Otherwise, it gets valued as ‘neutral’.
maze_data <- maze_data %>%
mutate(trial_congruency = ifelse(condition=='congruent_male' | condition == 'congruent_female','congruent','neutral'))
Finally, we can classify participants by their particular political alignment; we can construe this broadly as “Republicans” vs. “Democrats”, with those who declined to state a preference, or placed themselves in the middle, as “Non-Partisan”.
maze_data <- maze_data %>%
mutate(poli_party = ifelse(subject_information.party_alignment == 1 | subject_information.party_alignment == 2,'Republican',ifelse(subject_information.party_alignment == 4 | subject_information.party_alignment == 5,'Democrat','Non-Partisan')))
Visualisations
maze_data %>%
filter(trial_congruency == "neutral") %>%
ggplot(aes(x=gender_total, y=resid_rt, color=trial_congruency)) +
geom_point(alpha=.5) +
geom_smooth(method = 'lm', size=1.2) +
theme_minimal() +
labs(x="Gender Ideology Score", y="Residual Reading Time", color="Trial Congruency") +
scale_color_manual(values=bran_palette) +
theme(legend.position = "none") +
theme(text = element_text(size = 18))
`geom_smooth()` using formula 'y ~ x'
ggsave("maze_neutral_all.png", width=7,height=5,path='/Users/branpap/Desktop/gender_processing/talks_and_papers/qp_paper/figures')
`geom_smooth()` using formula 'y ~ x'
maze_data %>%
filter(gender_total < 75) %>%
ggplot(aes(x=gender_total, y=resid_rt, color=trial_congruency)) +
geom_point(alpha=.5) +
geom_smooth(method = 'lm', size=1.2) +
theme_minimal() +
labs(x="Gender Ideology Score", y="Residual Reading Time", color="Trial Congruency") +
scale_color_manual(values=bran_palette)
`geom_smooth()` using formula 'y ~ x'
ggsave("maze_all_incremental.png", width=7,height=5,path='/Users/branpap/Desktop/gender_processing/talks_and_papers/qp_paper/figures')
`geom_smooth()` using formula 'y ~ x'
maze_data %>%
filter(trial_congruency == "neutral") %>%
ggplot(aes(x=gender_total, y=resid_rt, color=trial_congruency)) +
geom_point(alpha=.5) +
geom_smooth(method = 'lm', size=1.2)
`geom_smooth()` using formula 'y ~ x'
Reading Time by Congruency & Gender
maze_speaker_means <- maze_data %>%
group_by(condition,workerid) %>%
summarize(MeanRT=mean(resid_rt))
`summarise()` has grouped output by 'condition'. You can override using the `.groups` argument.
maze_data %>%
group_by(condition,trial_gender) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender)) +
geom_point(size=3) +
geom_jitter(data = maze_speaker_means, aes(y=MeanRT),alpha=.1,color='darkred') +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
scale_color_manual(values = bran_palette) +
theme_minimal() +
labs(x="Trial Condition",y="Mean Reading Time (Residual)",color="Trial Gender Seen")
`summarise()` has grouped output by 'condition'. You can override using the `.groups` argument.
ggsave("maze_all means.png", width=7,height=5,path='/Users/branpap/Desktop/gender_processing/talks_and_papers/qp_paper/figures')
Item Means
maze_data %>%
group_by(condition,trial_gender,trial_congruency,lexeme) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender,shape=trial_congruency)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~ lexeme) +
theme(axis.text.x = element_text(angle = 45, vjust = .7, hjust=.7)) +
scale_color_manual(values = bran_palette) +
facet_wrap(~lexeme)
`summarise()` has grouped output by 'condition', 'trial_gender', 'trial_congruency'. You can override using the `.groups` argument.
Morphological Type and Gender
maze_data %>%
group_by(condition,trial_gender,morph_type) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
scale_color_manual(values = bran_palette) +
facet_wrap(~morph_type) +
theme(axis.text.x = element_text(angle=45, vjust = 0.5))
`summarise()` has grouped output by 'condition', 'trial_gender'. You can override using the `.groups` argument.
maze_data %>%
filter(!is.na(poli_party)) %>%
filter(morph_type == 'compound') %>%
filter(poli_party != 'Non-Partisan') %>%
group_by(poli_party,condition,trial_gender,trial_congruency) %>%
summarize(MeanRT = mean(resid_rt), CI.Low = ci.low(resid_rt), CI.High = ci.high(resid_rt)) %>%
mutate(YMin = MeanRT - CI.Low, YMax = MeanRT + CI.High) %>%
ggplot(aes(x=condition,y=MeanRT,color=trial_gender,shape=trial_congruency)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=YMin,ymax=YMax), width=.25) +
facet_wrap(~ poli_party, nrow = 1) +
theme(axis.text.x = element_text(angle = 45, vjust = .7, hjust=.7)) +
scale_color_manual(values = bran_palette)
`summarise()` has grouped output by 'poli_party', 'condition', 'trial_gender'. You can override using the `.groups` argument.
Reading Time by Congruency and Ideology
maze_data %>%
filter(!is.na(poli_party)) %>%
filter(poli_party != "Non-Partisan") %>%
filter(trial_congruency == "neutral") %>%
group_by(gender_total,workerid,trial_gender,poli_party) %>%
summarise(meanrt = mean(resid_rt)) %>%
ggplot(aes(x=gender_total,y=meanrt,color=trial_gender)) +
geom_point() +
geom_smooth(method='lm') +
scale_color_manual(values = bran_palette, ) +
facet_wrap(~poli_party) +
labs(x="Gender Ideology Score",y="Mean Residual Reading Time",color="Trial Gender Name") +
theme(text=element_text(size=15))
`summarise()` has grouped output by 'gender_total', 'workerid', 'trial_gender'. You can override using the `.groups` argument.
`geom_smooth()` using formula 'y ~ x'
ggsave("maze_neutral_poli.png", width=7,height=5,path='/Users/branpap/Desktop/gender_processing/talks_and_papers/qp_paper/figures')
`geom_smooth()` using formula 'y ~ x'
{r} # maze_data %>% # filter(!is.na(subject_information.party_alignment)) %>% # filter(poli_party != "Non-Partisan") %>% # filter(trial_congruency == "neutral") %>% # group_by(workerid,trial_gender,subject_information.party_alignment) %>% # summarise(meanrt = mean(resid_rt)) %>% # ggplot(aes(x=subject_information.party_alignment,y=meanrt,color=trial_gender)) + # geom_bar() + # scale_color_manual(values = bran_palette, ) + # labs(x="Gender Ideology Score",y="Mean Residual Reading Time",color="Trial Gender Name") + # theme(text=element_text(size=15)) #Model
maze_data <- maze_data %>%
mutate(ctrial_congruency = as.numeric(as.factor(trial_congruency))-mean(as.numeric(as.factor(trial_congruency)))) %>%
mutate(ctrial_gender = as.numeric(as.factor(trial_gender))-mean(as.numeric(as.factor(trial_gender)))) %>%
mutate(cgender_link = scale(gender_link)) %>%
mutate(cgender_total = scale(gender_total)) %>%
mutate(cmorph_type = as.numeric(as.factor(morph_type))-mean(as.numeric(as.factor(morph_type)))) %>%
mutate(cgender = as.numeric(as.factor(subject_information.gender))-mean(as.numeric(as.factor(subject_information.gender))))
complex_model <- lmer(resid_rt~ctrial_congruency*ctrial_gender*cgender_total + (1|workerid) + (1|lexeme),data = maze_data)
summary(complex_model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: resid_rt ~ ctrial_congruency * ctrial_gender * cgender_total + (1 | workerid) + (1 | lexeme)
Data: maze_data
REML criterion at convergence: 1235.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.2718 -0.6627 -0.0981 0.5897 4.1441
Random effects:
Groups Name Variance Std.Dev.
workerid (Intercept) 0.04665 0.2160
lexeme (Intercept) 0.01362 0.1167
Residual 0.07193 0.2682
Number of obs: 3271, groups: workerid, 175; lexeme, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 8.630e-03 3.114e-02 3.549e+01 0.277 0.78328
ctrial_congruency 5.433e-02 9.412e-03 3.072e+03 5.773 8.58e-09 ***
ctrial_gender 1.003e-02 9.412e-03 3.072e+03 1.065 0.28676
cgender_total 4.962e-02 1.708e-02 1.712e+02 2.906 0.00415 **
ctrial_congruency:ctrial_gender -2.322e-02 1.883e-02 3.072e+03 -1.233 0.21755
ctrial_congruency:cgender_total 1.411e-02 9.410e-03 3.072e+03 1.499 0.13386
ctrial_gender:cgender_total -1.066e-02 9.425e-03 3.073e+03 -1.131 0.25822
ctrial_congruency:ctrial_gender:cgender_total -1.067e-02 1.883e-02 3.072e+03 -0.567 0.57094
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) ctrl_c ctrl_g cgndr_ ctrl_cngrncy:ct_ ctrl_cngrncy:cg_ ctrl_g:_
ctrl_cngrnc 0.000
ctrial_gndr 0.000 -0.003
cgender_ttl -0.003 0.000 0.000
ctrl_cngrncy:ct_ 0.000 0.000 0.001 0.000
ctrl_cngrncy:cg_ 0.000 0.000 -0.001 0.001 0.000
ctrl_gndr:_ 0.000 0.000 0.001 0.001 -0.001 -0.001
ctrl_cn:_:_ 0.000 0.001 -0.001 0.001 -0.001 0.004 0.005
Data Read-in
prod_data <- read.csv("production_data.csv")
Exclusions
prod_exclusion <- prod_data %>% filter(name=='attention') %>%
group_by(workerid) %>%
summarise(accuracy = mean(correct)) %>%
mutate(exclude = ifelse(accuracy < 0.80,'Yes','No')) %>%
filter(exclude == "Yes")
prod_data <- prod_data[!(prod_data$workerid %in% prod_exclusion$workerid),]
Additional Information
prod_data <- prod_data %>%
mutate(gender_trans = 100 - (rowMeans(prod_data[gender_transcendence_cols]))) %>%
mutate(gender_link = rowMeans(prod_data[gender_linked_cols]))
gender_all = c('gender_trans','gender_link')
prod_data <- prod_data %>%
mutate(gender_total = rowMeans(prod_data[gender_all]))
prod_data <- prod_data %>%
filter(type == "critical") %>%
mutate(response_gender = ifelse(response == "actress" | response == "anchorwoman" | response == "stewardess" | response == "businesswoman" | response == 'camerawoman' | response == 'congresswoman' | response == 'craftswoman' | response == 'crewwoman' | response == 'firewoman' | response == 'forewoman' | response == 'heiress' | response == 'heroine' | response == 'hostess' | response == 'huntress' | response == 'laywoman' | response == 'policewoman' | response == 'saleswoman' | response == 'stuntwoman' | response == 'villainess' | response == 'weatherwoman',"female",ifelse(response == "anchor" | response == "flight attendant" | response == "businessperson" | response == 'camera operator' | response == 'congressperson' | response == 'craftsperson' | response == 'crewmember' | response == 'firefighter' | response == 'foreperson' | response == 'layperson' | response == 'police officer' | response == 'salesperson' | response == 'stunt double' | response == 'meteorologist',"neutral",ifelse(response == "anchorman" | response == "steward" | response == "businessman" | response == 'cameraman' | response == 'congressman' | response == 'craftsman' | response == 'crewman' | response == 'fireman' | response == 'foreman' | response == 'layman' | response == 'policeman' | response == 'salesman' | response == 'stuntman' | response == 'weatherman',"male",'male/neutral')))) %>%
mutate(congruency = ifelse(gender == response_gender,"true","false")) %>%
mutate(neutrality = ifelse(response_gender == "neutral","true","false"))%>%
mutate(morph_type = ifelse(lexeme!= 'actor' & lexeme!= 'host' & lexeme !='hunter' & lexeme!= 'villain' & lexeme!= 'heir' & lexeme!= 'hero','compound','adoption')) %>%
mutate(poli_party = ifelse(subject_information.party_alignment == 1 | subject_information.party_alignment == 2,'Republican',ifelse(subject_information.party_alignment == 4 | subject_information.party_alignment == 5,'Democrat','Non-Partisan')))
Responses by Political Ideology
prod_data %>%
filter(!is.na(poli_party)) %>%
filter(morph_type =="compound") %>%
ggplot(aes(x=poli_party, fill=response_gender)) +
geom_bar(position="fill") +
facet_wrap(~gender) +
scale_fill_manual(values = bran_palette) +
labs(x="Participant Political Party", fill="Gender of Response", y="Proportion of Responses", title="Gender of Response by Gender of Stimulus Name") +
theme(text=element_text(size=15)) +
theme(axis.text.x = element_text(angle=25))
ggsave("prod_all_poli.png", width=7,height=5,path='/Users/branpap/Desktop/gender_processing/talks_and_papers/qp_paper/figures')
prod_data %>%
filter(!is.na(subject_information.party_alignment)) %>%
filter(morph_type =="compound") %>%
ggplot(aes(x=subject_information.party_alignment, fill=response_gender)) +
geom_bar(position="fill") +
facet_wrap(~gender) +
scale_fill_manual(values = bran_palette) +
labs(x="Participant Political Party", fill="Gender of Response", y="Proportion of Responses", title="Gender of Response by Gender of Stimulus Name")
Gender of Response by Political Alignment and Gender Ideology
prod_data %>%
filter(!is.na(poli_party)) %>%
mutate(response_neutral = ifelse(response_gender == "neutral",1,0)) %>%
filter(gender!="filler" & gender!= "attention" & gender!="" & morph_type=="compound") %>%
group_by(gender,gender_total,poli_party) %>%
summarise(proportion = mean(response_neutral)) %>%
ggplot(aes(x=gender_total, y=proportion, color=gender)) +
geom_point() +
geom_smooth() +
scale_color_manual(values = bran_palette) +
facet_wrap(~poli_party) +
labs(x="Gender Ideology Score", y="Proportion of Gender Neutral Responses",color="Gender of Name Seen") +
theme(text=element_text(size=15))
`summarise()` has grouped output by 'gender', 'gender_total'. You can override using the `.groups` argument.
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggsave("prod_neutral_poli.png", width=10,height=5,path='/Users/branpap/Desktop/gender_processing/talks_and_papers/qp_paper/figures')
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
prod_data %>%
filter(!is.na(poli_party)) %>%
mutate(response_neutral = ifelse(response_gender == "neutral",1,0)) %>%
filter(gender!="filler" & gender!= "attention" & gender!="") %>%
group_by(gender,subject_information.age,poli_party) %>%
summarise(proportion = mean(response_neutral)) %>%
ggplot(aes(x=subject_information.age, y=proportion, color=gender)) +
geom_point() +
geom_smooth() +
scale_color_manual(values = bran_palette)
`summarise()` has grouped output by 'gender', 'subject_information.age'. You can override using the `.groups` argument.
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
Warning: Removed 4 rows containing non-finite values (stat_smooth).
Warning: Removed 4 rows containing missing values (geom_point).
prod_data %>%
filter(!is.na(poli_party)) %>%
mutate(response_neutral = ifelse(response_gender == "neutral",1,0)) %>%
filter(gender!="filler" & gender!= "attention" & gender!="") %>%
group_by(gender,workerid,poli_party) %>%
summarise(proportion = mean(response_neutral)) %>%
ggplot(aes(x=poli_party, y=proportion, fill=poli_party)) +
geom_boxplot(varwidth = T) +
scale_fill_manual(values=bran_palette) +
facet_wrap(~gender) +
theme(legend.position = "none") +
labs(x="Participant Political Party", y="Proportion",title="Mean Prop. of Neutral Responses by Stimuli Gender") +
theme(text=element_text(size=16)) +
theme(axis.text.x = element_text(angle=20))
`summarise()` has grouped output by 'gender', 'workerid'. You can override using the `.groups` argument.
ggsave("prod_neutral_poli_box.png", width=7, height=5,path='/Users/branpap/Desktop/gender_processing/talks_and_papers/qp_paper/figures')
Gender by Gender, no Ideology
prod_data %>%
filter(morph_type =="compound") %>%
ggplot(aes(x=gender, fill=response_gender)) +
geom_bar(position="fill") +
scale_fill_manual(values = bran_palette) +
labs(x="Stimulus Gender", fill="Gender of Response", y="Proportion of Responses", title="Gender of Response by Gender of Stimulus Name") +
theme_minimal()
Models
prod_data_compounds <- prod_data %>%
filter(morph_type == "compound") %>%
mutate(cgender_total = scale(gender_total)) %>%
mutate(response_congruency = as.numeric(ifelse(congruency=="true","1","0"))) %>%
mutate(cage = scale(subject_information.age)) %>%
mutate(neutrality_binary = ifelse(neutrality=="true",1,0))
final_dat <- merge(prod_data_compounds,lex_freqs,by="lexeme") %>%
mutate(neutrality_binary = ifelse(neutrality=="true",1,0)) %>%
filter(morph_type == "compound") %>%
mutate(cgender_total = scale(gender_total)) %>%
mutate(response_congruency = as.numeric(ifelse(congruency=="true","1","0"))) %>%
mutate(cage = scale(subject_information.age)) %>%
mutate(cmean_left_neutral = scale(mean_left_neutral)) %>%
mutate(mean_all = (mean_left_neutral + mean_right_neutral)/2) %>%
mutate(cmean_all = scale(mean_all))
production_model_one <- lmer(neutrality_binary~cgender_total + poli_party + gender + cmean_all + (1|workerid) + (1|lexeme) + (1|name),data=final_dat)
summary(production_model_one)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: neutrality_binary ~ cgender_total + poli_party + gender + cmean_all + (1 | workerid) + (1 | lexeme) + (1 | name)
Data: final_dat
REML criterion at convergence: 4017.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.7141 -0.6983 -0.1105 0.7911 2.9453
Random effects:
Groups Name Variance Std.Dev.
workerid (Intercept) 0.0309677 0.1760
name (Intercept) 0.0003027 0.0174
lexeme (Intercept) 0.0632691 0.2515
Residual 0.1481350 0.3849
Number of obs: 3822, groups: workerid, 273; name, 24; lexeme, 14
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.55630 0.07029 14.08933 7.915 1.49e-06 ***
cgender_total -0.05448 0.01378 269.10221 -3.954 9.84e-05 ***
poli_partyNon-Partisan -0.04567 0.03909 269.03597 -1.168 0.244
poli_partyRepublican -0.11824 0.02960 269.00650 -3.994 8.38e-05 ***
gendermale -0.11708 0.01444 21.90517 -8.105 4.90e-08 ***
cmean_all 0.03832 0.06752 12.00057 0.567 0.581
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) cgndr_ pl_N-P pl_prR gndrml
cgender_ttl 0.084
pl_prtyNn-P -0.126 -0.110
pl_prtyRpbl -0.192 -0.444 0.327
gendermale -0.103 -0.010 0.001 0.003
cmean_all 0.000 0.000 0.000 0.000 0.002
(table(prod_data$subject_information.gender))
Female Male Other
40 3380 1980 80
prod_gender_table <- prod_data %>%
group_by(workerid,subject_information.gender,poli_party) %>%
summarise(subject_gender = paste(unique(subject_information.gender)))
`summarise()` has grouped output by 'workerid', 'subject_information.gender'. You can override using the `.groups` argument.
table(prod_gender_table$subject_gender,prod_gender_table$poli_party)
Democrat Non-Partisan Republican
1 0 1
Female 82 25 62
Male 42 10 46
Other 4 0 0
prod_data_all <- read.csv("production_data.csv") %>%
filter(type=="filler_semantic" | type=="filler_grammatical") %>%
group_by(lexeme,type) %>%
summarise(lexeme=paste(unique(lexeme)))
`summarise()` has grouped output by 'lexeme'. You can override using the `.groups` argument.
table(prod_data_all$type)
filler_grammatical filler_semantic
20 30
It is my hope and intention that this color palette be color-blind friendly. If you have accessibility concerns, please do not hesitate to reach out to me!↩︎
This amounts to an hourly rate of $20.73. We originally anticipated that participants would take an average of 7 minutes to complete the experiment, and set the base pay at $15 an hour.↩︎